Text Similarity

State of the Art

BM-25: Term frequency weight

BM25(w,D)=1+k1+k/(c(w)1+b(|D||D¯|)/|D¯|)

Illustration for 1+k1+k/x:





Reference

Text Mining: https://www.coursera.org/learn/text-mining